10 - Deep Learning [ID:9912]

50 von 787 angezeigt

So today we have a slightly busy schedule because today's topic is going to be deep

reinforcement learning.

And deep reinforcement learning is a very, very nice technique because you can really

learn strategies with that.

It's very useful.

We've seen that there is these powerful systems that learn to play Atari games and Go and

so on.

And we will talk about them today.

Note that you actually could fill an entire lecture series with this topic.

So we will try to get across the concept that we need to do reinforcement learning.

And then in the end, you will see how we can adopt this to deep learning.

Okay, so the outline is first we will talk about sequential decision making.

Then we will really introduce the concept of reinforcement learning.

This will then need the definition of a Markov decision process, the concept of policy iteration

and other solution methods.

And then we will talk about the deep reinforcement learning, deep Q learning, alpha go, alpha

go, zero, the stuff we've been waiting for.

So let's start with the sequential decision making.

And we start with the very basics.

So consider this multi-armed bandit problem.

So we have several decisions to make and we essentially have an action that we can take.

And then we want to essentially select, so let's say we have four slot machines and then

you want to select one of the slot machines where you put your coin and try your luck.

And now you want to figure out which of the machines or which action to choose is the

best one such that you know this is the best action that you can take.

So this is in the end we want to have some definition of what is good and then we want

to choose in this context what is best.

So we have an action A and there's a time t.

So there's not just a single thing happening but there's multiple of those actions at every

different time point t.

And the set A describes the actions that you can possibly take.

So A at time t has a different unknown probability density function.

And this unknown probability density function is generating a reward rt.

So the reward at the time t is essentially what you win.

This is the amount of money, it could be the amount of money that you're winning.

And we have an unknown probability density function that is constrained with respect

to the action A. Depending what action I choose I get a different reward but I don't know

how this probability density function looks like and it can be different.

So this gives us action and reward.

These are two very important concepts and from that now we can derive something that

we call a policy.

Now the policy is something that we use to choose an action.

So we use pi here, so pi of A is the policy of A and this is some way of formalizing how

to choose an action.

Policy tells us which action to take.

We have actions of a finite set and we have some rewards that we can generate if we actually

take this action.

Okay, so now how can we do this?

Well of course we want to find an action that is to some extent good.

And here we are choosing the maximum expected reward over time t.

Teil einer Videoserie :

Deep Learning

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:00:00 Min

Aufnahmedatum

2018-12-18

Hochgeladen am

2019-04-12 16:47:58

Sprache

en-US

Deep Learning (DL) has attracted much interest in a wide range of applications such as image recognition, speech recognition and artificial intelligence, both from academia and industry. This lecture introduces the core elements of neural networks and deep learning, it comprises:

(multilayer) perceptron, backpropagation, fully connected neural networks
loss functions and optimization strategies
convolutional neural networks (CNNs)
activation functions
regularization strategies
common practices for training and evaluating neural networks
visualization of networks and results
common architectures, such as LeNet, Alexnet, VGG, GoogleNet
recurrent neural networks (RNN, TBPTT, LSTM, GRU)
deep reinforcement learning
unsupervised learning (autoencoder, RBM, DBM, VAE)
generative adversarial networks (GANs)
weakly supervised learning
applications of deep learning (segmentation, object detection, speech recognition, ...)

Tags

Per RSS abonnieren